\OMCER 




ISSN (e): 2250 - 3005 1 1 Vol, 04 1 1 Issue, 5 1 1 May - 2014 1 1 
International Journal of Computational Engineering Research (IJCER) 



Experimental Studies of the Statistical Properties of Network 
Traffic Based on the BDS -Statistics 

1 2 

Alexey Smirnov ' Dmitriy Danilenko 

1 Professor in the Department of Software, Kirov ohrad National Technical University, Kirovohrad, Ukraine, 
2 Graduate student in the Department of Software, Kirovohrad National Technical University, Kirovohrad, 

Ukraine 



Abstract: 

Experiments, which outcome in the results of correlation analysis of network traffic based on the BDS- 
test, which may be used as part of an analytical component of modern anti-virus systems, have been 
conducted. In addition, the correlation analysis of network traffic may be used for organizing of one of 
the main elements of the system for monitoring a network activity as a touch subsystem ( sensors to 
collect traffic information), as well as an analytical part (decision-making module component). A unit 
to assess the significance differences of two or more samples (series)of independent observations of 
network traffic (Wilcoxon criterion) is used to solve the problem of detection of individual services of a 
telecommunications system following the observed network traffic. It allows to set different data 
streams belonging to the same general totality with a given accuracy and reliability.. 

Keywords: telecommunication systems and networks, system of intrusion detection and prevention, 
BDS- statistics 



I. INTRODUCTION 

To ensure the safety of modern telecommunication networks the so-called intrusion detection system 
(IDS) and intrusion prevention systems (IPS) are used [1-14]. At the heart of their operation there is the 
collection, analysis and processing of information about the events related to the security perimeter of the 
telecommunications network, the accumulation of the collected data, monitoring of network activity of 
individual services, deciding on the status of the protected system, as well as identifying and countering possible 
unauthorized use of information and communication resources [2]. One of the directions in improving systems 
for intrusion detection and prevention is the study of anomalies (Anomaly-Based Intrusion Detection and 
Prevention Systems - AB IDPS) in telecommunication systems, which is based on statistical analysis of 
network traffic [2]. Within this approach, IDPS defines a "normal" network activity of individual information 
services of a telecommunication system, then all the traffic that is not covered under the definition of "normal" 
is marked as "anomalous". 

The analysis of correlation methods for the identification of objects showed that one of the most 
effective approaches for identifying dependencies in data traffic is BDS -statistics, which is constructed based on 
the BDS-tests (BDS-methods). BDS-tests are effective methods to identify dependencies in the time series. 
Their aim is to test the null hypothesis H 0 about the independence and the identical distribution of the time 
r 

series' values ^ = (^j,^ 2 ,---£n) > usm g f° r it a criterion of significance. According to this criterion, for 
accepting the hypothesis H 0 it is necessary to choose a critical domain G a satisfying the condition of 
P(geG) = a, where g(S;j,i; 2 ,...,i; N ) - is the observation statistics, and a -is an adjustable level of 
significance [17-20]. 

II. BDS-STATISTICS DESCRIPTION 

r 

BDS-test is based on the statistic value of w© (BDS-statistics) [17-20]: 



( \ /xt TtCiii.nI 8 ) C 1N _ m (s) 

,(s)=VN-m + l — 

° m ,N(£) 
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where C m N (e)-Cj N _ m (e) m - (numerator BDS-statistics) is determined by the correlation integrals C mN (s), 
Cj N (e) for the dimension m ; e - is the radius of the hypersphere; a m N (s) - is a standard deviation of the 

difference C m N (e)- Cj N _ m (e) m ; N - a number of elements of the time series. 

A number of studies [17-20] have proposed "simplified" algorithms of BDS-statistics estimation. In 
them, for the calculation of C m N (s)(m>l), it is necessary to perform "embedding" of the time series of m- 

dimensional pseudo-phase space, the elements of which, by the theorem of Takens [19], are the points 
^™ = (^i'^i+i'---'^i+m) w i tn tn e coordinates {i^+klkLi given by m successive values of the original time series. 
Correlation integral determines the frequency of contact of any pair of phase space points in the hypersphere of 



8 radius: 



C mj N( e ) : 



2 N N m-1 

(N - m + 1)(N - m) s =^ t ^+i j=o 







<E 




Si ~Sj 


> S 



fe} 1=1 0<i<N and 0< j<N, 
where I^™,^™) is the Heaviside function for all pairs of values i and j . 

The value of the correlation integral approaches a definite limit as s decreases. The analysis of studies 
[17-20] showed that there is a range of s values, which allows performing calculations with the specified 
accuracy coefficient. This range depends on the number of elements of the time series N. If s is too small, there 
will not be enough points to capture the statistical structure; if s is too large, there will be too many points. 

The studies [17-20] recommend to choose s so that e=0.5a-^2a, where a - is a standard deviation 

of the process {'li},'!! ■ In accordance with the theory of statistics, the dependence of the correlation integral 
from £ is as follows: 



C m ,N 

where D c is a correlation dimension of the time series. 
For m = 1 we have: 

2 



(«)- 



Ci,w(e)= ~ T £l 8 ($„S t ). 
N(N-l) s=lt=s+1 

The studies performed has shown, that when N— >co, the correlation integral C m N (e)=> C[ N (e) m , 

and the value (c m N (s)- (q ^(s))™ )■ Vn - m + 1 is asymptotically normally distributed random variable with a 
mean zero and standard deviation a m N (s) , which is defined as: 



a m , N (8) = 2jk m +2gk m "J • (C 1-N (e)) 2j + (m-1) 2 -(C UN (e)) 2m -m 2 k(C 1>N (s)) 2 



where 



1 



(N-1)(N-2)N £ 



-3£ £i E (^s s ) + 2N 



BDS-statistics w(|) is a normally distributed random variable with the proviso that the estimate 

CT m,N( e ) i s close to its theoretical value cr m N (s) . 

The problem of detection of chaotic signal is considered as a non-parametric verification of one of the 
two hypotheses: 

r 

1) H 0 - the observed data (data traffic) ^ = (^ 1 ,^ 2 ,---4n) are independent and identically distributed, 
i.e. the density (function) distribution is factored F n (^!,^ 2 ,---^n) = n^i^fe)' 
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2) Hj - the obtained as the result of the experiment data (traffic information) have a certain 
relationship (a process is structured). 

r 

According to the hypothesis H 0 the statistics W© is asymptotically distributed as N(0,1), if a number 
of observations asymptotically approaches the infinity. A number of studies [17-20] settle the hypothesis about 
the need for the pilot study of more than 500 observations. Such number of experiments will allow to argue 
about the reliability of the received results. 

The studies have shown that the criterion of the hypothesis validity H 0 (in the absence of any data 
traffic dependencies) is the inequality: 

|w m , N (e|<l,96. 

For a value of the statistic w mN (s) the given value corresponds to the level of significance 
a = 0,05 (probability of a 1 st type error), and when the above inequality is the true hypothesis H 0 (I.I.D.) is 
accepted with the probability P Hg « 0,95. 

r 

In the case where the alternative hypothesis Hj is true, a distribution of the statistic criterion w(Q is 
changed. Therefore, when checking statistical hypotheses it is insufficient to focus on the value of the 
significance level a . 

A power of the criterion 1 — p or probability of error of the second kind p should be determined 

when considering the alternative hypothesis H l , which implies dependence (possibly nonlinear) of a time 

series, if the first difference of the natural logarithms have been taken. A power of the criterion is the 

r 

probability of considering the alternative hypothesis Hj in applying the criterion w(^) with the proviso that 

it is true, that is its ability to detect the existing deviation from the null hypothesis. Obviously, for a fixed 
error of the 1 st kind (we set it ourselves, and it does not depend on the criterion properties) the criterion will 
be the better, the more its power is (i.e., the smaller is the error of the 2 nd kind). For calculating the power of 

the criterion l-P(P = p^w(Q eG a IHjl), G a - is a critical area at a given level of significance a it is 

necessary to know the conditional density distribution p^w(^) I H,j . The power of the criterion (test) is 
determined empirically. 

For the experiment and improving the reliability of the results, it is necessary to choose such 
embedding dimension m, whereby the phase space reconstruction is neither "too rare" nor "too crowded." A 
number of studies [ ] recommends m = 6 in experiments. 

Thus, the conducted analysis of different approaches in the statistical testing showed that the BDS- 
test gives a possibility to detect various types of deviations from the independence and identical distribution 

r 

and can serve as a general model test of the processes classification (time series) \ , especially in the presence 
of nonlinear dynamics. 

The nonparametric nature of BDS-testing may be considered to be its main feature. This is reflected 

r 

in the fact that BDS-test uses a nonlinear function w(Q as the statistics from observations, the distribution of 

r 

which is independent of the observed values ^ distribution. In this case, we are able to get some information 
about the multidimensional function (density) of distribution F n (S,],^ 2 ,---^n) analyzing a dimensional 
empirical function of distribution p(w) of the statistics w . 

BDS-test calculation may be carried out by various methods, many implementations are known, this 
technique proposes using the implementation suggested by the authors of the BDS-test (by W. A. Brock, W. 
D. Dechert and J. A. Sheinkman). Fast algorithms for calculating BDS statistics has been also proposed [17- 
20]. Below are the results of experimental studies of the network traffic statistical properties based on the 
correlation analysis of time series (by BDS-testing). The results include the data matching the most popular 
network protocols and services. 
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III. EXPERIMENTAL STUDIES OF THE NETWORK TRAFFIC STATISTICAL 
PROPERTIES BASED ON THE CORRELATION ANALYSIS OF TIME SERIES 

The sample size for research based on the statistical properties of the correlation analysis of time series 
is N = 1000, the calculations has been performed with different parameters (the results are shown in the 
respective tables). The analysis of HTTP traffic experimental data shows that the phase portrait of network 
traffic for data transmission using HTTP protocols shows having some dependency, consisting in grouping the 
most points in a certain area. Table 1 shows the values of BDS-test with different sets of parameters. HTTP is an 
application level of protocol data (initially - in the form of hypertext documents). HTTP is based on the "client- 
server" technology, that assumes existing consumers (clients) who initiate a connection and send a request, and 
providers (servers) that are waiting for connection request, produce the necessary actions and return a message 
back with the result. 



Table 1 - The results of BDS-test for experimental data of HTTP traffic with different parameters of m and e 





m 


s = 0.5 o 


S = 0 




4 


13,52 


12,69 


HTTP 


5 


13,97 


11,554 




6 


13,87 


10,65 




7 


13,58 


9,92 



Thus, as it follows from the experimental data obtained for this kind of HTTP traffic, the characteristic 
value of the BDS-test is as follows: 

- for the radius e = 0.5 o when m = 4. .7 the values range from « 15 to « 14; 

- for the radius e = o there exists a greater variation of values ranging from 9.92 to 12.69. 
These values may be used as test (reference) values at the detection of the traffic type and the 

corresponding network service. 

Let's process the obtained experimental data of FTP traffic. FTP is a protocol used to transfer files via 
computer networks. It allows connecting to FTP servers, browsing the contents of directories and downloading 
files from the server or uploading them to the server; moreover, a mode of transmission of files between the 
servers is possible. FTP protocol refers to the application layer protocols and uses the TCP transport protocol for 
data transfer. The commands and data, in contrast to most other protocols, are transmitted on different ports. The 
outbound port 20 opened on the server side is used to transmit data, and the port 21 for sending commands. The 
port for receiving the customer data is determined in the matching dialogue. If a file transfer is interrupted for 
any reason, the protocol provides a means to download the rest of the file, which is very useful when 
transferring large files. 

In the above snippet of experimental data of network traffic using FTP protocols the transition from a 
small number of packets (transmission of protocol commands, retrieving a list of directories, etc.) and the 
beginning of active download files at a certain level of speed can be clearly seen. As the server used had a load 
speed limit which was significantly less than bandwidth network capabilities, the "bursts" which rapidly 
decrease to the threshold speed limit are sometimes observed. 

A points' grouping that indicate the presence of dependencies in the source data are observed in the 
phase portrait. Table 2 shows the values of BDS-test with different sets of parameters. 

The data of Table 2 may be used as reference values for the FTP traffic in a system of intrusion 
detection of telecommunications systems and networks. 



Table 2 - Values of BDS-test for experimental data of FTP traffic with different parameters of m and £ 





m 


s=0.5 a 


8= 0 




4 


19,5 


17,69 


FTP 


5 


17,81 


16,13 




6 


16,49 


14,91 




7 


15,437 


13,95 



Let's process the obtained experimental data of Skype traffic. Skype is free software with a closed code 
which enables encrypted voice communications over the Internet between computers (VoIP), as well as paid 
services for calls to mobiles and landlines. Skype app also allows making conference calls, video calls, and also 
provides text messaging (chat) and file transfer. There is an opportunity to transmit an image from the screen 
instead of an image from a webcam. 

Two areas around which the majority of the points are formed are observed in the phase portrait for 
«skype» traffic, which indicates the presence of dependencies in the original sequence. Table 3 shows the values 
of BDS-test for Skype traffic with different sets of parameters. 
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Table 3 - Values of BDS-test of experimental data for Skype traffic with different parameters of m and e 





m 


s=0.5 a 


8= 0 




4 


16,49 


180,11 


Skype 


5 


15,05 


163,77 




6 


13,95 


150,94 




7 


13 


140,58 



The results of the Skype traffic studies obtained listed in Table 2 may be used as reference values in 
intrusion detection systems of communication systems and networks. 

Let's process the obtained experimental data of streaming video traffic. Streaming video is multimedia 
data which is continuously received by a user from a provider of stream broadcasting. Currently, this service is 
very popular and traffic volume is about half of the transmitted traffic on the Internet. 

On the corresponding fragment of network traffic many bursts and downs characteristic for streaming 
video is observed. Two areas may be identified in the phase portrait. The first has the distribution close to 
random. However, in another area of the explicit the points are grouped with sufficiently high accuracy, 
indicating the presence of dependencies in the original sequence. Table 4 shows the values of BDS-test with 
different sets of parameters for experimental data of streaming video. 



Table 4 - Values of BDS-test for experimental data of stream broadcasting with different parameters of m and e 





m 


8=0.5 o 


8= 0 




4 


32,5 


18,2 


Streaming video 


5 


32,254 


19,28 




6 


38,45 


20,39 




7 


41,42 


28,49 



The data analysis of Table 4 shows that the values of the BDS-test for streaming video are in a 
sufficiently large range, but can be used as a reference. 

Thus, the results of experimental studies of the statistical properties of network traffic using correlation 
analysis of time series show that the specific values of the BDS-test can be "meterized" for different services of 
telecommunication systems and networks. The calculations confirm the theoretical assumptions that different 
types of traffic the result of BDS-test gives different values that can be taken as a reference. 

Thus, Table 6 shows the average values corresponding to the different traffic types for different values 
of e, which allows identifying the network services. For example, for the values of e = 0.5 o and e = o the 
averaged values of the BDS-tests for each type of traffic can be distinguished, and on this basis a network 
activity detection process of a separate service of a telecommunications network can be organized. 

It should be noted that in a real telecommunication network popular services may be used 
simultaneously, which would change the values of the BDS-test. However, the implementation of unauthorized 
network intrusion will change the statistical properties of the network traffic and the corresponding values of the 
BDS-statistics. Furthermore, even in the combined operation, as a rule, some specific service prevails, which 
allows to select it. And traces of the traffic generated by malicious software, viruses, etc., modify the meaning of 
the test to a level sufficient to generate solutions about suspicious traffic, i.e. detection of possible traces of the 
virus traffic in the general flow. 

Table 5 - Average values of BDS-test for different services of a telecommunication network 



Average values of the BDS-tests 


Service type 


8=0.5 0 


8=0 


HTTP 


13,7 


11,2 


Skype 


14,6 


171,9 


Multiservice traffic 


11,5 


13,5 


FTP 


17,3 


15,7 


Streaming video 


36,2 


21,6 


Malicious software 


40,7 


28,5 



Thus, the experimental data confirm the theoretical assumption about the possibility of using the values 
of the BDS-test to detect traces of malicious software in the network traffic. 

On this basis, the correlation analysis of the network traffic based on the BDS-testing can be used as 
follows. First, as part of the analytical component of modern anti-virus systems. Second, the correlation analysis 
of the network traffic can be used for the organization of one of the main elements of the system for monitoring 
of a network activity as a touch subsystem (sensors to collect information about the traffic) and the analytical 
part (decision component module). 
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When building a system for monitoring of a network activity it is necessary to solve the problem of 
detection of individual services and telecommunications system from the observed network traffic. Let's 
consider the problem of assessing the significance of differences of two or more samples (series) of independent 
observations of the network traffic in order to establish (with a given accuracy and reliability) of their belonging 
to the same general totality. For this, let's use the mathematical formalism of statistical research and a criterion 
of belonging of the two samples to one and the same general totality (Wilcoxon criterion). 

IV. EXPERIMENTAL STUDIES OF BELONGING OF TWO SAMPLES OF NETWORK 
TRAFFIC TO ONE AND THE SAME GENERAL TOTALITY 

In the study of complex technical systems the problem of assessing the significance of differences in 
two or more samples (series) of independent observations often arises, i.e. it is necessary to install (with a given 
accuracy and reliability) their belonging to the general totality. Let's suppose there are two samples: 

X[, x 2 , X N1 , (1) 

and 

yi, Y2, ■■■,Ym (2) 
of random variables X and Y, with distribution Px(t) and Py(0, respectively. 
Let's assume that the observed Xj and y ; give different values of sample means 

x* = (x 1 +x 2 +...+ x Ni )/N 1 , (3) 

y * = (y 1 + y 2 +... + y )/N 2 , (4) 



and/or sample dispersions (variances) 

1 

N 



= y * 



i Ni 

x-— £(Xi-x*) 2 , (5) 



1 i=l 

^Y^Zty.-y*) 2 ' ( 6 ) 



n 2 i=1 



~ 2 ~ 2 
CT X = CT Y . 



The solution of assessing the significance of differences in the observed values of x ; and y t is reduced to 
testing the null hypothesis H 0 , consisting in the distribution functions Px(t) and Py(t) are identical for all t. An 
alternative hypothesis is formulated in the form of the inequality Px(0 < Py(t)- 

The criterion of belonging of two samples to one and the same general totality (Wilcoxon's criterion) is 
based on counting the number of inversions. For this, the observed values of x ; and y ; are located in the general 
sequence of ascending order of their values. The resulting non-decreasing sequence contains N t + N 2 of the 
values and if the hypothesis Px(t) = Py(0 is correct, the values of both sequences x,, x 2 , x N1 and y,, y 2 , 
yN 2 are well mixed. The degree of mixing is determined by the number of inversions of the first sequence 
members relatively to the second. If the overall ordered sequence one certain value of x is preceded by one 
value of y, which means that there is one inversion. If some values of x are preceded by k values of y, then this 
value of x has k inversions. 

Let's denote the number of inversions for the value of Xj by Uj relatively to antecedent values of y. 
Then the total number of inversions (for all values of the sequence x b x 2 , x N1 relatively to the values from 
the sequence y i, y 2 , . . ., yN2) will t> e determined by the sum 

u=U!+u 2 +... + u Ni . 

The null hypothesis H 0 is rejected if the number u exceeds the selected in accordance with the level of 
significance of the boundary, determined from the calculation that with the sample sizes of Nj>10and 
N 2 > 10 the number of inversions u is approximately normally distributed with center: 



and the variance: 



M„=^, (7) 



At a significance level of q and the normality of distribution of inversions number, the probability of not getting 
the value of u into the critical area (which means no refutation of the null hypothesis) is [ ]: 
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P(M U -u <e) = l-q = 2® 



f \ 

8 



(9) 



where e sets the value of the maximum deviation of the resulting estimate from the true value, i.e. e represents 
the absolute value of the error in determining the values of the desired characteristics. At the same time, the 
confidence probability 

P fl =P(|M u -u|<8) 
indicates the probability of achieving the specified accuracy s . 

Let's fix the value of the confidence probability P fl , the values of the left and right critical boundaries 
will be, respectively, equal to: 

Ui=M u -t a a u , (10) 
u 2 =M u +t a a u , (11) 
where: a u = ■ s jD u -is the standard deviation of the number of inversions, t a - is the root of the equation 

The less is the significance level of q, the less likely is the possibility to reject the tested hypothesis 
when it is true, i.e., to make a mistake of the first kind. But with a decrease in the level of significance the range 
of admissible errors expands, which leads to the increase in the probability of making the wrong decision, i.e., 
committing type II errors. Typically, the significance level is selected following the considerations that the 
relevant events in the present situation of research are (with some risk) "practically impossible" (q = 10%, 5%, 
2%, l%etc). 

Using the discussed above criterion of belonging of the two samples to the same general totality, let's 
carry out experimental studies of the network traffic properties of telecommunication systems and networks. 

Let's carry out an experimental study of belonging of two samples of network traffic to the same 
general totality. For this, let's form the sample (l)-(2) at 100 time reference of randomly selected network 
traffic segments corresponding to different telecommunication and information services (YouTube (720p), 
YouTube (360p), Skype (voice), Skype (video), E-mail, HTTP, FTP), thus N, = N 2 =100. Let's estimate 
selective average (3) - (4) and the dispersion (5) - (6), and expectation (7) and the dispersion (8) of the number 
of inversions: 

M u = = 5000, D u = — j— =(Nj + N 2 + 1) = 167500, a u « 409,3 . 

Let's suppose the significance level of q = 10%. Using (9) to (10) - (1 1) let's calculate the values of the 
left and right critical borders: 

P=l-q = 0,9, 



fc-flziV 0-1(0,45)* 1,65, 



Uj =M U -t a CT u «4324,7, 
u 2 =M u +t a a u * 5675,3. 

Let's consider the first case when the observed data of network traffic YouTube (720p) act as the 
sample Xi, x 2 , x 100 , and the data of network traffic YouTube (720p), YouTube (360p), Skype (voice), Skype 
(video), E-mail, HTTP and FTP alternatively act as the sample y,, y 2 , yioo- 

The obtained results of the studies of belonging of the respective samples of network traffic to one and 
the same general totality are given in Table 7. The studies have been conducted for the different ways of 
representing of the network traffic, as in the form of packets' number transmitted per time unit, so as the number 
of bits transmitted per second. 

The results of experimental studies provided in Table 6 indicate that the network traffic of YouTube 
(720p) service differs in its statistical properties from the network traffic of other services of a 
telecommunication system. By the criterion of belonging of the network traffic samples to one and the same 
general totality (Wilcoxon criterion) for different ways of representing of the network traffic (packet/s, bit/s) the 
number of the observed inversions U exceeds the selected in accordance with the level of significance border 
and the null hypothesis H 0 is rejected. At the same time, the Wilcoxon criterion gives a reliable mechanism for 
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detecting of the network traffic YouTube (720p), as a number of inversions u for the corresponding service is, 
as expected, close to the theoretical value M u = 5000, the null hypothesis is rejected. 



Table 6 - Results of research of belonging of network traffic samples to one and the same general 
totality (network traffic YouTube (720p)) 



Traffic type 


Representation of traffic 


Representation of traffic 


(service) 


as a packet/s 


as a bit/s 




Inversions' 


The decision on testing 


Inversions' number 


The decision on testing 




number u 


the hypothesis 


u 


the hypothesis 


YouTube (720p) 


5005 


Accepted 


5006 


Accepted 


YouTube (360p) 


542 


Rejected 


572 


Rejected 


Skype (voice) 


302 


Rejected 


300 


Rejected 


Skype (video) 


2140 


Rejected 


364 


Rejected 


E-mail 


54 


Rejected 


47 


Rejected 


HTTP 


1732 


Rejected 


1011 


Rejected 


FTP 


9539 


Rejected 


9960 


Rejected 



Let's consider the second case when the observed data of network traffic YouTube (360p) act as the 
sample Xi, x 2 , Xioo, and the data of other network traffic alternatively act as the sample yi, y 2 , yioo- The 
obtained results are shown in Table 8. 

The data shown in Table 7 confirm the correctness of detection of network traffic YouTube (360p), as 
the results of studies of the belonging of samples to one and the same general totality in different presentation 
forms the number of the observed inversions u is close to the theoretical value M u =5000(u = 5006 and 
u = 5008, respectively), the null hypothesis is not rejected. It should be also noted that by the Wilcoxon 
criterion, one should statistically differentiate the network traffic YouTube (360p) from the traffic services of 
most other services. HTTP traffic is the exception, which in each of the studied methods of presentation 
(packet/s and bits/s) is statistically indistinguishable from the traffic YouTube (360p), by Wilcoxon criterion 
belonging of these network traffic samples to the same general totality is not rejected. 



Table 7 - Results of research of belonging of network traffic samples to one and the same general totality 

(network traffic YouTube (360p)) 



Traffic type 


Representation of traffic 


Representation of traffic 


(service) 


as 


a packet/s 


as a bit/s 




Inversions' 


The decision on testing the 


Inversions' 


The decision on testing 




number u 


hypothesis 


number u 


the hypothesis 


YouTube (720p) 


9555 


Rejected 


9525 


Rejected 


YouTube (360p) 


5006 


Accepted 


5008 


Accepted 


Skype (voice) 


7364 


Rejected 


2900 


Rejected 


Skype (video) 


9502 


Rejected 


7542 


Rejected 


E-mail 


1066 


Rejected 


590 


Rejected 


HTTP 


5333 


Accepted 


4766 


Accepted 


FTP 


989 


Rejected 


998 


Rejected 



The results of the study of the statistical properties of network traffic Skype (voice) and testing of the 
hypothesis that supplies statistical data to the same general totality are shown in Table 8. 

Table 8 - Results of research of belonging of network traffic samples to one and the same general totality (network traffic 
Skype (voice)) 



Traffic type 
(service) 


Representation of traffic 
as a packet/s 


Representation of traffic 
as a bit/s 




Inversions' 


The decision on testing 


Inversions' 


The decision on testing 




number u 


the hypothesis 


number u 


the hypothesis 


YouTube (720p) 


9795 


Rejected 


9797 


Rejected 


YouTube (360p) 


2764 


Rejected 


7171 


Rejected 


Skype (voice) 


5003 


Accepted 


5008 


Accepted 


Skype (video) 


10000 


Rejected 


10000 


Rejected 


E-mail 


333 


Rejected 


212 


Rejected 


HTTP 


3742 


Rejected 


5684 


Rejected 


FTP 


10000 


Rejected 


10000 


Rejected 
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The data presented in Table 9 show the difference in the average sense of the properties of the network 
traffic Skype (voice) and the properties of other services' traffic of a telecommunication system. The observed 
number of inversions lies in the critical area, indicating that the test samples belong to different general 
totalities. In other words, using the Wilcoxon criterion allows detecting Skype (voice) traffic with high 
reliability and distinguishing it from other network traffics. 

The results of studies of the network traffic Skype (video) properties are shown in Table 10. The 
analysis of the obtained results shows that the traffic Skype (video) is statistically distinguishable from other 
traffic services, and using of the Wilcoxon criterion allows the detection of the corresponding service of a 
telecommunication system. 

Table 10 shows the results of studies of network traffic samples belonging to one and the same general 
totality, obtained by analyzing statistical properties of the E-mail traffic network. Tables 11 and 12 show the 
results of similar studies of HTTP and FTP network traffic, respectively. The analysis shows that the network 
traffic of E-mail, and FTP services are statistically different by the Wilcoxon criterion from other traffic 
services, and for them there is no similarity observed in any of the investigated samples. At the same time, the 
statistical properties of the HTTP traffic are similar to the traffic observed in the operation of the service 
YouTube (360p). This is also confirmed by the data of Table 7. 



Table 9 - Results of research of belonging of network traffic samples to one and the same general totality 

(network traffic Skype (video)) 



Traffic type 


Representation of traffic 


Representation of traffic 


(service) 


as a packet/s 


as a bit/s 




Inversions' 


The decision on testing 


Inversions' 


The decision on testing 




number u 


the hypothesis 


number u 


the hypothesis 


YouTube (720p) 


7932 


Rejected 


9733 


Rejected 


YouTube (360p) 


604 


Rejected 


2586 


Rejected 


Skype (voice) 


0 


Rejected 


0 


Rejected 


Skype (video) 


5009 


Accepted 


5009 


Accepted 


E-mail 


3 


Rejected 


0 


Rejected 


HTTP 


2699 


Rejected 


3108 


Rejected 


FTP 


9901 


Rejected 


10000 


Rejected 


Table 10 - Results of research of belong 


ing of network traffic samples to one and the same general totality 






(network traffic E-mail) 






Traffic type 


Representation of traffic 


Representation of traffic 


(service) 


as 


a packet/s 


as a bit/s 




Inversions' 


The decision on testing 


Inversions' 


The decision on testing 




number u 


the hypothesis 


number u 


the hypothesis 


YouTube (720p) 


947 


Rejected 


961 


Rejected 


YouTube (360p) 


9016 


Rejected 


9483 


Rejected 


Skype (voice) 


9774 


Rejected 


9890 


Rejected 


Skype (video) 


999 


Rejected 


10000 


Rejected 


E-mail 


5001 


Accepted 


5001 


Accepted 


HTTP 


8095 


Rejected 


8320 


Rejected 


FTP 


10000 


Rejected 


10000 


Rejected 


Table 1 1 - Results of research of belong 


ing of network traffic samples to one and the same general totality 






(network traffic HTTP) 






Traffic type 


Representation of traffic 


Representation of traffic 


(service) 


as a packet/s 


as a bit/s 




Inversions' 


The decision on testing 


Inversions' 


The decision on testing 




number u 


the hypothesis 


number u 


the hypothesis 


YouTube (720p) 


8314 


Rejected 


9056 


Rejected 


YouTube (360p) 


4734 


Accepted 


5288 


Rejected 


Skype (voice) 


6297 


Rejected 


4323 


Rejected 


Skype (video) 


7327 


Rejected 


6923 


Rejected 


E-mail 


1990 


Rejected 


1770 


Rejected 


HTTP 


5003 


Accepted 


5009 


Accepted 


FTP 


9893 


Rejected 


998 


Rejected 
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Table 12 - Results of research of belonging of network traffic samples to one and the same general totality (network traffic 
FTP) 



Traffic type 


Representation of traffic 


Representation of traffic 


(service) 


as a packet/s 


as a bit/s 




Inversions' 


The decision on testing 


Inversions' 


The decision on testing 




number u 


the hypothesis 


number u 


the hypothesis 


YouTube (720p) 


559 


Rejected 


137 


Rejected 


YouTube (360p) 


29 


Rejected 


7 


Rejected 


Skype (voice) 


0 


Rejected 


0 


Rejected 


Skype (video) 


195 


Rejected 


0 


Rejected 


E-mail 


2 


Rejected 


0 


Rejected 


HTTP 


204 


Rejected 


4 


Rejected 


FTP 


5007 


Accepted 


5007 


Accepted 



The final results of the conducted experimental studies are summarized in Tables 13, 14, which list the 
number of inversions and the results of testing the hypothesis of homogeneity of network traffic for different 
forms of representation (packet/s and bits/s, respectively). 

Table 1 3 - The number of inversions and the results of testing the hypothesis of homogeneity of network traffic (packet/s) 





YouTube 
(720p) 


YouTube 
(360p) 


Skype (voice) 


Skype (video) 


E-mail 


HTTP 


FTP 


YouTube 


5005 


9555 


9795 


7932 


947 




8314 




559 


(720p) 






«— » 


«— » 


«— » 




«— » 




«— » 


YouTube 


542 


5006 


2764 


604 


9016 




4734 




29 


(360p) 


«— » 




«— » 


«— » 


«— » 








«— » 


Skype 


302 


7364 


5003 


0 


9774 




6297 




0 


(voice) 


«— » 


«— » 


«+» 


«— » 


«— » 




«— » 




«— » 


Skype 


2140 


9502 


10000 


5009 


999 




7327 




195 


(video) 


«— » 


«— » 


«— » 












«— » 


E-mail 


54 

«— » 


1066 

«— » 


333 

«— » 


3 

«— » 


5001 


1990 

«— » 


2 
«— » 


HTTP 


1732 


5333 


3742 


2699 


8095 




5003 




204 


«— » 




«— » 


«— » 


«— » 








«— » 


FTP 


9539 


989 


10000 


9901 


10000 




9893 




5007 


«— » 


« — » 


«— » 


« — » 


«— » 




«— » 






Table 14 - The number of inversions and the results of testing the hypothesis of homogeneity of network traffic (bits/s) 




YouTube 
(720p) 


YouTube 
(360p) 


Skype (voice) 


Skype (video) 


E-mail 


HTTP 


FTP 


YouTube 


5006 


9525 


9797 


9733 


961 




9056 




137 


(720p) 




«— » 


«— » 


«— » 


«— » 




«— » 




«— » 


YouTube 


572 


5008 


7171 


2586 


9483 




5288 




7 


(360p) 


«— » 




«— » 


«— » 


«— » 








«— » 


Skype 


300 


2900 


5008 


0 


9890 




4323 




0 


(voice) 


«— » 


«— » 


«+» 


«— » 


«— » 








«— » 


Skype 


364 


7542 


10000 


5009 


10000 




6923 




0 


(video) 


«— » 


«— » 


«— » 




«— » 




«— » 




«— » 


E-mail 


47 

«— » 


590 

«— » 


212 

«— » 


0 

«— » 


5001 


1770 

«— » 


0 

«— » 


HTTP 


1011 


4766 


5684 


3108 


8320 




5009 




4 


«— » 




«— » 


«— » 


«— » 








«— » 


FTP 


9960 


991 


10000 


10000 


10000 




998 




5007 


«— » 


«— » 


«— » 




«— » 




«— » 







The following designations are applied in tables 13, 14: 

«+» - the hypothesis of network traffic homogeneity is not rejected, 

«-» - the hypothesis of network traffic homogeneity is rejected, 

The results of the hypothesis testing shown in Tables 13 and 14 are symmetrical relatively to the main 
diagonal, which confirms the reliability of the obtained results in each specific experiment. 

The analysis of the data provided in Tables 13 and 14 shows that in the process of the study of various 
telecommunication services network traffic samples in the majority of cases, the hypothesis of homogeneity is 
rejected, i.e., there is a correct decision on the belonging of the samples to various processes. This provision 
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may be made the basis of one of the elements of the system for network activity monitoring, i.e. by using the 
Wilcoxon criterion an initial detection of telecommunications service can be carried out. 

One of the most important characteristics of a random variable is the index of dispersion, which allows 
comparison of the random parameters of the studied process to determine the significance of differences or 
matching their characteristics. The following section provides a variance analysis of individual services network 
traffic and TCS services based on an assessment of relations the variances sample, the statistical hypotheses 
about the homogeneity of the simulation results on this main index are checked. 

CONCLUSIONS 

1 . The results of experimental studies of the statistical properties of network traffic using the correlation analysis 
of time series confirm the theoretical assumptions that for different types of traffic the result of the BDS-test gives different 
values that can be taken as a reference, i.e. and on this basis a network activity detection process of a separate service of a 
telecommunications network can be organized. Thus, the experimental data confirm the theoretical assumption about the 
possibility of using the values of the BDS-test to detect traces of malicious software in the network traffic. 

2. The obtained results of the correlation analysis of the network traffic based on the BDS-test is recommended 
to be used, first, as part of the analytical component of modern anti-virus systems. Second, the correlation analysis of the 
network traffic may be used for the organization of one of the main elements of the system for monitoring network activity 
as a touch subsystem (sensors to collect traffic information) and as the analytical part (decision module component). 

3. To solve the problem of individual telecommunications system's services detection in the observed network 
traffic, an evaluation unit of the significance differences of two or more samples (series) of independent observations in the 
network traffic (Wilcoxon criterion) is used. It allows to set different data streams belonging to the same general totality with 
a given accuracy and reliability. 

4. Analysis of the obtained experimental research data shows that during the study of samples of network traffic 
of various telecommunication services in the majority of cases, the hypothesis of homogeneity is rejected, i.e., there is a 
correct decision on the belonging the samples to different processes. This position is recommended to be used as one of the 
elements of a system for monitoring network activity, i.e. through the use of the Wilcoxon criterion is offered to produce a 
primary detection of a telecommunication service. 

5. A promising direction for further research is the assess of the relationship of the sample variances, statistical 
hypothesis testing in the uniformity of simulation results in terms of dispersion. The results of these studies are designed to 
perform a comparison of the random parameters of the tested process to determine the significance of differences or 
matching in their characteristics. 
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